Maximum-likelihood affine cepstral filtering (MLACF) technique for speaker normalization
نویسنده
چکیده
We present a novel technique of minimizing the acoustic variability of speakers by transforming the features extracted from the speaker’s data to better fit the recognition model. The concept of maximum-likelihood affine cepstral filtering (MLACF) will be introduced for feature transformation, along with solutions for the transformation parameters that maximize the likelihood of the test data with respect to a given recognition model. It is shown that for log-concave distributions, the solution of the MLACF problem can be obtained using convex programming. HMM-based digit recognition on the TIDIGITS database is presented to demonstrate the flexibility of the transformation in compensating for large acoustic mismatches between the speakers in the training and test database. In addition, it will be shown that the technique requires estimation of far fewer transformation parameters compared to existing techniques, thus allowing fast, real-time compensation.
منابع مشابه
Joint Cohort Normalization in a Multi-Feature Speaker Verification System
In this paper we propose a new fusion technique, termed Joint Cohort Normalization Fusion, where the information fusion is done prior to the likelihood ratio test in a speaker verification system. The performance of the technique is compared against two popular types of fusion: feature vector concatenation and expert opinion fusion, for fusion of Mel Frequency Cepstral Coefficients (MFCC), MFCC...
متن کاملSpeaker Normalization with All-pass Transforms Center for Language and Speech Processing 72 Speaker Normalization with All-pass Transforms
Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a particular speaker’s speech is rescaled or warped prior to the extraction ...
متن کاملSpeaker dependent model order selection of spectral envelopes
This work introduces a maximum-likelihood based model order (MO) selection technique for spectral envelopes to apply speaker dependent adaptation in the feature-space similar to vocal tract length normalization. Speech recognition systems based on spectral envelopes are using a fixed MO for the underlying linear parametric model. Using a fixed MO over different speakers or channels might not be...
متن کاملNoise robust speaker verification with delta cepstrum normalization
This paper introduces a delta cepstrum normalization (DCN) technique for speaker verification under noisy conditions. Cepstral feature normalization techniques are widely used to mitigate spectral variations caused by various types of noise; however, little attention has been paid to normalizing delta features. A DCN technique that normalizes not only base features but also delta-features was r...
متن کاملMLLR transforms as features in speaker recognition
We explore the use of adaptation transforms employed in speech recognition systems as features for speaker recognition. This approach is attractive because, unlike standard framebased cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification. Affine transforms are computed for the Gaussian means of the acoustic models used in a re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001